Sorting items in a
linked list
Purpose
This document describes a sorting algorithm that is very
efficient and well suited for single linked lists. It is a stable
merging sorting algorithm with O(N.log(N)) worst case performance
and O(N) best case performance.
Introduction: linked lists
If you have a set of objects within a program you can store them
in a few different ways. Arrays are the most common way to store a
set of objects but other data structures are also possible, like
linked lists and trees, each having their own
advantages/disadvantages.
Each object needs to have a Next property
that points to the next object in line. The list can be passed on by
passing the first item of the list. The second item can be found via
the first item via Next, the third via the
Next property of the second item etc etc.
The end of the list is reached if the Next
property is unassigned. This is called a single linked list. If each
object also has a Previous property it is
called a double linked list.
Figure 1. A sample single linked list.
Insertion in a linked list is extremely fast and no array or
whatever is needed to keep the list. Traversing the list is easy but
a linked list has no fast random access: get item number 123. If the
list is a double linked list, deletions are very fast too.
Introduction: Sorting algorithms
There are numerous different sorting algorithms. Most sorting
algorithms describe sorting in an array. They tend to sort items by
comparing two items and swapping their places (or not). Some are
more efficient than others. For example the simple bubble sort
algorithm tends to become four times slower when the number of items
doubles in the array. The quicksort algorithm does better, but can
exhibit the same behavior on some specific inputs.
The efficiency of a sorting algorithm is usually given by the
expected and worst case amount of compares needed to sort N items.
For bubblesort the expected amount of compares is O(N.N), for
quicksort it is O(N.log(N)). Especially if N is large quicksort
becomes faster and faster compared to Bubblesort since log(N) is
smaller than N if N > 1. Quicksort, in worst case, also
needs O(N.N) compares and swaps.
Most algorithms are not well suited for single linked lists. They
assume that you have full random access or forward and backward
access. In a single linked list you only have access to the first
item and from there you only can traverse forward, scanning the
items one by one; no random access whatsoever.
Merge sort
A merging sort algorithm merges small sorted lists to larger
sorted lists. Merging two sorted lists is quite a simple algorithm:
as long as both lists contain items, compare the top of each list
with each other and place the smallest item on the bottom of another
list. If one list is exhausted, the new list can be placed on top
that one and we end up with a single sorted list.
The algorithm starts with N 'lists' each containing a single
item. Single item lists are always sorted by nature. The first pass
merges those N lists to N / 2 lists of length 2. The next
pass merges it to N / 4 lists of length 4. After log(N)
merge passes we end up with a single merged list of all N items. In
each pass the algorithm must do N compares max. So the algorithm
must do O(N.log(N)) on average and worst case.
Here the process is depicted in a figure:
Figure 2. Merging four 3 item lists into
one.
Keeping a stack of merged lists
If you use the algorithm as described above you need to keep
track of N lists max. This means you need quite an array for
bookkeeping, or another reference in each object. There is a simple
trick that relieves this bookkeeping to log(N) + 1 lists
max.
If we keep a small array of lists, the stack, with the first slot
can contain only lists of length 1. The second slot can only contain
a list of length 2, the third slot can only contain lists of length
4 etc. If we want to place a list of 1 item in the first slot, we
succeed if it was empty or, if do a merge with the list in the first
slot and empty the first slot. We now have a list of size 2 and can
do the same with the second slot etc, until we reach a slot without
a list. Since each slot in the small array keeps the double amount
of items of the previous slot we only need log(N) + 1
slots in the array. A general maximum of 32 would be enough for 32
bit processors.
After all items are added, we end up with 1 or more slots being
occupied by lists. We first scan for the first list we can find and
keep on scanning the whole array and propagate that list to the next
item each step, doing a merge if that slot was occupied. We end with
the full sorted list in the last slot of the array.
It makes the algorithm even faster. Since we access recently
accessed items more often the caches of modern processors can do a
better job and enhance the performance even further.
Figure 3. List-sizes while merge sorting 5 items
with a stack.
Using natural order
Instead of creating single item lists to start with, we can use
the natural order of the items in the unsorted list. The first item
is always picked up. As long as there is an item and that item is
equal or larger that the last item found, we can append it. If not,
we keep the list found so far and start again with the smaller item
found. This adds a maximum of N - 1 compares to the
algorithm. If we keep a special case for two items and swap those in
order we make an extra N / 2 compares compared to the
situation without using natural order.
On the other hand if the list was already sorted we only need
N - 1 compares to verify that and we're done. The best
case performance is O(N), and the worst case still O(N.log(N)). This
improvement also works great if the unsorted list contains sorted
streaks.
Some Delphi code implementing the sorting algorithm
If you can 'read' Delphi code you can read a quite optimized
implementation of the sorting algorithm below.
TBaseItemComparator = function(ItemA, ItemB: TBaseItem): Boolean;
procedure TBaseList.Sort(LessOrEqual: TBaseItemComparator;
UseNaturalOrder: Boolean);
var
i: Integer;
Item, Next, LastItem, StackItem, Sentinel: TBaseItem;
Stack: array [..] of TBaseItem;
begin
if FCount <= then Exit;
Sentinel := TBaseItem.Create;
try
for i := Low(Stack) to High(Stack) do Stack[i] := nil;
Item := FFirst;
if UseNaturalOrder then begin
repeat
Next := Item.FNext;
if Assigned(Next) then begin
if LessOrEqual(Item,Next) then begin
repeat
StackItem := Next;
Next := Next.FNext;
until (not Assigned(Next)) or (not LessOrEqual(StackItem,Next));
end else begin
StackItem := Item;
Item := Next;
Next := Next.FNext;
Item.FNext := StackItem;
end;
StackItem.FNext := nil;
end else begin
Item.FNext := nil;
end;
i := ;
while Assigned(Stack[i]) do begin
LastItem := Sentinel;
StackItem := Stack[i];
Stack[i] := nil;
repeat
if LessOrEqual(StackItem,Item) then begin
LastItem.FNext := StackItem;
LastItem := StackItem;
StackItem := StackItem.FNext;
if Assigned(StackItem) then Continue;
LastItem.FNext := Item;
Break;
end else begin
LastItem.FNext := Item;
LastItem := Item;
Item := Item.FNext;
if Assigned(Item) then Continue;
LastItem.FNext := StackItem;
Break;
end;
until False;
Item := Sentinel.FNext;
Inc(i);
end;
Stack[i] := Item;
Item := Next;
until not Assigned(Item);
end else begin
repeat
Next := Item.FNext;
Item.FNext := nil;
i := ;
while Assigned(Stack[i]) do begin
LastItem := Sentinel;
StackItem := Stack[i];
Stack[i] := nil;
repeat
if LessOrEqual(StackItem,Item) then begin
LastItem.FNext := StackItem;
LastItem := StackItem;
StackItem := StackItem.FNext;
if Assigned(StackItem) then Continue;
LastItem.FNext := Item;
Break;
end else begin
LastItem.FNext := Item;
LastItem := Item;
Item := Item.FNext;
if Assigned(Item) then Continue;
LastItem.FNext := StackItem;
Break;
end;
until False;
Item := Sentinel.FNext;
Inc(i);
end;
Stack[i] := Item;
Item := Next;
until not Assigned(Item);
end;
i := ;
while not Assigned(Stack[i]) do Inc(i);
Item := Stack[i];
Inc(i);
while i < Length(Stack) do begin
if Assigned(Stack[i]) then begin
LastItem := Sentinel;
StackItem := Stack[i];
Stack[i] := nil;
repeat
if LessOrEqual(StackItem,Item) then begin
LastItem.FNext := StackItem;
LastItem := StackItem;
StackItem := StackItem.FNext;
if Assigned(StackItem) then Continue;
LastItem.FNext := Item;
Break;
end else begin
LastItem.FNext := Item;
LastItem := Item;
Item := Item.FNext;
if Assigned(Item) then Continue;
LastItem.FNext := StackItem;
Break;
end;
until False;
Item := Sentinel.FNext;
end;
Inc(i);
end;
FFirst := Item;
LastItem := nil;
repeat
Item.FPrev := LastItem;
LastItem := Item;
Item := Item.FNext;
until not Assigned(Item);
FLast := LastItem;
finally
Sentinel.Free;
end;
end;
|